John Foreman, Data Scientist
  • Home
  • Data Smart book
  • Speaking & Events
  • Featured Talks
  • Blog
  • MailChimp
Contact

The Perilous World of Machine Learning for Fun and Profit: Pipeline Jungles and Hidden Feedback Loops

1/5/2015

1 Comment

 
I haven't written a blog post in ages. And while I don't want to give anything away, the main reason I haven't been writing is that I've been too busy doing my day job at MailChimp. The data science team has been working closely with others at the company to do some fun things in the coming year.

That said, I got inspired to write a quick post by this excellent short paper out of Google,  "Machine Learning: The High Interest Credit Card of Technical Debt."

Anyone who plans on building production mathematical modeling systems for a living needs to keep a copy of that paper close.

And while I don't want to recap the whole paper here, I want to highlight some pieces of it that hit close to home.


Read More
1 Comment

Data Privacy, Machine Learning, and the Destruction of Mysterious Humanity

2/22/2014

7 Comments

 
Recently, I wrote an article about Disney’s new RFID location and transaction tracking technology, the MagicBand. Perhaps more magical for Walt than it is for you, the band allows Disney to track their customers’ actions inside their parks (and possibly outside). Where you walk, what you eat, when you stop to borderline-abusively yell at your kids. All that magic gets tracked.

This personal data is then used to deliver individually customized experiences to park-goers, and as a by-product, Disney gets to do all sorts of analysis on the data to figure out how to squeeze you for all you’re worth.

My personal tale with the MagicBands is one of pirates. My kids rode Pirates of the Caribbean all day, so when they saw Mickey, he talked not about Buzz or about Peter Pan but about Jack Sparrow. Bam! Big data in action. Mickey knows.

This kind of tracking is unnerving for some. Indeed, one of my post’s readers called me an asshole for so flippantly discussing the topic. 


Read More
7 Comments

Data science is crack, not milk. Act like it.

7/5/2013

4 Comments

 
Data scientists, we are our own worst enemies.

Let me whip out a little Torah here for a moment to explain. When the Israelites fled Egypt, they were pretty stoked about going to the promised land. But then after wandering for a while in the desert, they started to doubt that this whole desert-wandering was worth the trouble. And they began to miss their lives back in Egypt:
"remember the fish we ate in Egypt at no cost—also the cucumbers, melons, leeks, onions and garlic." -Numbers 11:5

Who WOULDN'T miss melons with garlic...wait, I must be reading that wrong.

Read More
4 Comments

Don't Forget the "What" and "Why" in Big Data

5/13/2013

0 Comments

 
I'm a film nut. The irritating kind who tries to convince you that Tinker, Tailor, Soldier, Spy wasn't the most boring movie of all time.

And back in college, I got this idea in my head that I wanted to be a director. That's where all the predictable guys end up once they realize they'll never be an athlete or a guitarist.

Excited by the prospect of producing the next Sneakers (best. film. ever.), I set about gathering equipment. I built my own steady cam out of parts from Lowes. I secured a wheel chair to use for tracking shots. I knew how I'd do the long, fluid takes that eventually would become synonymous with "Foreman."

The problem was that none of this really mattered. Because I didn't know what I wanted to shoot or why I wanted to shoot it. Creating even the bones of a story, much less a compelling narrative, didn't really interest me. No, all I cared about was the How. How I would shoot the film. I obsessed over the tools. 

And ultimately, I never shot one scene.

This is exactly where we find ourselves in the world of big data today. The proliferation of vendors who drive the conversation at conferences and on tech blogs are concerned primarily with the How. That's their business: building technology and providing services to make your big data fantasy a reality. It's your job, not theirs, to articulate whatever that fantasy is.

Whenever I  meet other data science practitioners, I listen carefully to how they introduce their work.

If they say something like, "We're using some cool technology to do X," and then they proceed to tell me about this X they're doing, then I know this person's project stands a fighting chance. They know what they're building.

But when I hear, "We're doing some cool stuff using X technology," and then they proceed to tell me about their stack, I get a little nervous. Can they even define "cool stuff" or are they just tinkering?

Now, I get that you need to choose wisely the technologies you use to solve a problem. But the exciting part should be the business that's being done. So many folks are being pressured into doing a project, ANY PROJECT WILL DO, that uses Hadoop. Because their boss's boss wants a report on how the company is "doing big data." This is a regrettable situation. Not every business needs to do Big Data, which is why I really appreciated Evan Miller's grounded post on predictive analytics last week.

If I can't clearly articulate to my peers what my analytics project is and why I'm doing it, then forget everything else. Hell, that's why I'm writing an analytics book completely in spreadsheets -- because I'm tired of the tool discussion. When you use the most vanilla tools, the business problems come back into view.

Since my failed movie venture, I've swung in completely the other direction. I obsess over what business problem I should be solving with analytics and why they need solving. What does it get my company (MailChimp), and how does it help our customers?   

Can you articulate the business problem that you're throwing software, talent, and hardware at? Or are you just buying tools that are looking for a use?

0 Comments

Your analytics talent pool is not made up of misanthropes

2/20/2013

0 Comments

 
To quote Pride and Prejudice, businesses have for many years “labored under the misapprehension” that their analytics talent was made up of misanthropes with neither the will nor the ability to communicate or work with others on strategic or creative business problems. These employees were meant to be kept in the basement out of sight, fed bad pizza, and pumped for spreadsheets to be interpreted in the sunny offices aboveground.

This perception is changing in industry as the big data phenomenon has elevated data science to a C-level priority. Suddenly folks once stereotyped by characters like Milton in Office Space are now “sexy.” The truth is there have always been well-rounded, articulate, friendly analytics professionals (they may just like Battlestar more than you), and now that analytics is an essential business function, personalities of all types are being attracted to practice the discipline.

Yet, despite this evolution both in talent and perception, many employees, both peers and managers, still treat their analytics counterparts in ways that erode effective analytics practice within an organization. Here are 5 things to keep in mind as you interact with your analytics colleagues in the future:

1) Analytics is not a one-way conversation. If you’re going to ask a  data scientist to study demand drivers or task your analysts to pull some aggregate data from the Hadoop cluster, try not to just “take the data and run.” Analysts are humans, not a “layer” on top of your database so that MBAs can extract data. A data scientist is not a high-priced mechanical turk.

Remember to communicate why you need the data you need. And later, when that data has come to some use, you should check back in with the analyst to let them know that their efforts did not go unwasted. I’ve seen organizations suffer from an analytics “throttling” effect where analysts will cease or slow down their work for a particular manager or peer, because they think the manager never does anything with the data. Maybe the manager doesn’t, or maybe the manager just doesn’t check back in to let the analyst know the outcome of their work.

Data scientists don’t like data for its own sake. They like it for what it can do. So keep them in the loop.

2) Give credit where credit is due. Let’s say your data scientist performs a study showing how “user-agent of the customer visiting the website is predictive of conversion”  or “we can target customers with product recommendations based on the purchases of their nearest neighbors.” You then take this study and turn it into profit. The data scientist should receive some of the merit for having contributed to this work. It seems like common sense, but many businesses often think that crediting an analyst is like crediting the database they used. You wouldn’t give credit to Hadoop for your great strategic idea, so why would you give it to this curmudgeonly analyst? Data doesn’t become insight on its own. Someone had to craft those insights out of a pile of ugly transactional records, so give that person a pat on the back.

3) Allow analytics professionals to speak. Just because you may not have a knack for math, does not mean that your analyst isn’t adept at communicating. Allowing an analyst to present their own work gives them a sense of ownership and belonging within the organization. Some analysts may not want to communicate. That’s fine. But you’d be surprised how many would love to be part of the conversation if only they were given the chance. If they did the work, they might be able to better communicate the subtleties firsthand than an MBA could secondhand.

4) Don’t bring in your analytics talent too late. Often products and strategies are developed and launched by executives, managers, and marketers, and thrown in the wild long before someone thinks to ask the analyst, “Hey, how might we use data to make this product better? And how might we use the transactional data generated by this product to add value?” The earlier these questions get posed in the development cycle, the more impact analytics will have on the product in the long run.

Sure, you can’t do data science until you have data, but a slight variation in how you sell, market, or design a product may mean the difference between useable data later on and worthless data. Design, marketing, operations — there are many important considerations at the beginning of any product’s life. But don’t let that stop you from bringing the data scientist into the high-level strategic meetings. They might be able to shape the product to make it more profitable through predictive modeling, forecasting, or optimization. You don’t necessarily know what’s analytically possible. But they do.

5) Allow your scientists to get creative. When people think of creativity, they often think of the arts. But cognitively, there’s a lot of similarity between fine art and abstract algebra. Analytics professionals need instructions, projects, and goals just like all other employees, but that doesn’t mean they need to be told exactly what to do and how to do it 100% of the time.

Now that the world at large has realized products can be made from data or better sold through the judicious use of data, it’s in your best interest to give your analytics professionals some flexibility to see what they can dream up. Ask them to think about what problems lying about the business could be solved through analytics. Maybe it’s phone support prioritization, maybe it’s optimizing your supply chain or using predictive modeling in recruiting, maybe it’s revenue optimization through pricing — allow the analyst to think creatively about problems that seem outside their purview. It’ll keep them interested and engaged in the business, rather than feeling marginalized and stuck-in-the-basement.  A happy, engaged data scientist is a productive data scientist. And given how hard it is to recruit these professionals (they seem more like unicorns sometimes), hanging on to the talent you have is essential.

0 Comments

    Author

    Hey, I'm John, the data scientist at MailChimp.com.

    This blog is where I put thoughts about doing data science as a profession and the state of the "analytics industry" in general.

    Want to get even dirtier with data? Check out my blog "Analytics Made Skeezy", where math meets meth as fictional drug dealers get schooled in data science.

    Reach out to me on Twitter at @John4man

    Picture
    Click here to buy the most amazing spreadsheet book you've ever read (probably because you've never read one).

    Archives

    January 2015
    July 2014
    June 2014
    May 2014
    March 2014
    February 2014
    January 2014
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    February 2013

    Categories

    All
    Advertising
    Big Data
    Data Science
    Machine Learning
    Shamelessly Plugging My Book
    Talent
    Talks

    RSS Feed


✕